Topic Identification Using Wikipedia Graph Centrality

نویسندگان

  • Kino Coursey
  • Rada Mihalcea
چکیده

This paper presents a method for automatic topic identification using a graph-centrality algorithm applied to an encyclopedic graph derived from Wikipedia. When tested on a data set with manually assigned topics, the system is found to significantly improve over a simpler baseline that does not make use of the external encyclopedic knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Galaxysearch - Discovering the Knowledge of Many by Using Wikipedia as a Meta-Searchindex

We propose a dynamic map of knowledge generated from Wikipedia pages and the Web URLs contained therein. GalaxySearch provides answers to the questions we don’t know how to ask, by constructing a semantic network of the most relevant pages in Wikipedia related to a search term. This search graph is constructed based on the Wikipedia bidirectional link structure, the most recent edits on the pag...

متن کامل

Using Encyclopedic Knowledge for Automatic Topic Identification

This paper presents a method for automatic topic identification using an encyclopedic graph derived from Wikipedia. The system is found to exceed the performance of previously proposed machine learning algorithms for topic identification, with an annotation consistency comparable to human annotations.

متن کامل

Efficient Computation of Relationship-Centrality in Large Entity-Relationship Graphs

Given two sets of entities – potentially the results of two queries on a knowledge-graph like YAGO or DBpedia– characterizing the relationship between these sets in the form of important people, events and organizations is an analytics task useful in many domains. In this paper, we present an intuitive and efficiently computable vertex centrality measure that captures the importance of a node w...

متن کامل

A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogsphere

There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph, in which...

متن کامل

Predicting Central Topics in a Blog Corpus from a Networks Perspective

In today’s content-centric Internet, blogs are becoming increasingly popular and important from a data analysis perspective. According to Wikipedia, there were over 156 million public blogs on the Internet as of February 2011. Blogs are a reflection of our contemporary society. The contents of different blog posts are important from social, psychological, economical and political perspectives. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009